Assessing the Impact of Socio-economic Factors on Presidential Election Voting in the USA in 2016

Author
Affiliation

Zilu Wang

University of Glasgow

1 Introduction

The 2016 USA presidential election is one of the most dramatic and surprising elections in the US history. As a result, Republican nominee Donald Trump won the presidency with 304 electoral votes, compared to Democratic nominee Hillary Clinton’s 227 electoral votes. Trump became the first president without prior political or military experience. Additionally, he became the fifth president to win the presidency despite losing the popular vote, given that he received almost 3 million votes less than Hillary Clinton.

Donal Trump was known for his controversial statements and policies, which appealed to a significant portion of the American population. His victory in the 2016 election raised questions about the factors that influenced the voting patterns of the American electorate. In this analysis, we aim to explore the relationship between socio-economic factors and voting patterns in the 2016 presidential election. We will investigate how demographic, economic, and educational indicators at the county level may have influenced the voting outcomes in the election.

For better understanding of this paper, a brief summary of how US election works is provided. The U.S. presidential election is an indirect election where voters cast ballots for a slate of members of the Electoral College; these electors then directly vote for the President and Vice President. Each state is allocated a number of electors equal to its total number of Senators and Representatives in Congress, resulting in a total of 538 electors in the Electoral College. Most states have a “winner-takes-all” system where the candidate who receives the most popular votes in that state wins all its electoral votes. A candidate needs a majority of 270 electoral votes to win the presidency. The Electoral College system has been a subject of debate, as it is possible for a candidate to win the presidency without winning the popular vote, as was the case in the 2016 election.

The main datasets used in this project include the 2016 US presidential election results data, and the 2014 socio-economic data from the US Census Bureau. The election results dataset contains information on the number of votes received by each candidate in each county, the total number of votes, and the fraction of votes received by the Republican candidate. The demographic and socio-economic indicators dataset contains information on various indicators such as population, racial demographics, educational attainment, and median household income for each county. The dictionary dataset provides descriptions of the columns in the demographic and socio-economic indicators dataset.

There are three main research questions that we aim to address in this analysis: 1. Are there specific socio-economic or demographic factors that are associated with an increased or decreased preference for a political party, in a county? 2. Are there state-wide factors that are associated with a preference for one political party over another? 3. How well can the model associating socioeconomic factors with 2016 election results be used to predict the final state-wide outcome of the presidential elections in 2016?

The analysis will be conducted in several stages, including data inspection and pre-processing, exploratory data analysis, and predictive modeling. Various statistical and machine learning techniques will be applied to explore the relationship between socio-economic factors and voting patterns in the 2016 presidential election. The results of this analysis will provide insights into the factors that may have influenced the voting outcomes in the 2016 election and contribute to a better understanding of the dynamics of US presidential elections.

2 Methodology

2.1 Exploratory Data Analysis

2.1.1 Data Inspection and Pre-processing

# A tibble: 6 × 9
  state   state.po county   FIPS candidatevotesR candidatevotesD totalvotes
  <chr>   <chr>    <chr>   <dbl>           <dbl>           <dbl>      <dbl>
1 Alabama AL       Autauga  1001           18172            5936      24973
2 Alabama AL       Baldwin  1003           72883           18458      95215
3 Alabama AL       Barbour  1005            5454            4871      10469
4 Alabama AL       Bibb     1007            6738            1874       8819
5 Alabama AL       Blount   1009           22859            2156      25588
6 Alabama AL       Bullock  1011            1140            3530       4710
# ℹ 2 more variables: fracvotesR <dbl>, partywonR <dbl>
# A tibble: 6 × 53
   fips area_name     state_abbreviation PST045214 PST120214 POP010210 AGE135214
  <dbl> <chr>         <chr>                  <dbl>     <dbl>     <dbl>     <dbl>
1  1000 Alabama       <NA>                 4849377       1.4   4779736       6.1
2  1001 Autauga Coun… AL                     55395       1.5     54571       6  
3  1003 Baldwin Coun… AL                    200111       9.8    182265       5.6
4  1005 Barbour Coun… AL                     26887      -2.1     27457       5.7
5  1007 Bibb County   AL                     22506      -1.8     22915       5.3
6  1009 Blount County AL                     57719       0.7     57322       6.1
# ℹ 46 more variables: AGE295214 <dbl>, AGE775214 <dbl>, SEX255214 <dbl>,
#   RHI125214 <dbl>, RHI225214 <dbl>, RHI325214 <dbl>, RHI425214 <dbl>,
#   RHI525214 <dbl>, RHI625214 <dbl>, RHI725214 <dbl>, RHI825214 <dbl>,
#   POP715213 <dbl>, POP645213 <dbl>, POP815213 <dbl>, EDU635213 <dbl>,
#   EDU685213 <dbl>, VET605213 <dbl>, LFE305213 <dbl>, HSG010214 <dbl>,
#   HSG445213 <dbl>, HSG096213 <dbl>, HSG495213 <dbl>, HSD410213 <dbl>,
#   HSD310213 <dbl>, INC910213 <dbl>, INC110213 <dbl>, PVY020213 <dbl>, …

The first thing we need to check is whether there is any missing data in the datasets.

          state        state.po          county            FIPS candidatevotesR 
              0               0               0               0               0 
candidatevotesD      totalvotes      fracvotesR       partywonR 
              0               0               0               0 
              fips          area_name state_abbreviation          PST045214 
                 0                  0                 51                  0 
         PST120214          POP010210          AGE135214          AGE295214 
                 0                  0                  0                  0 
         AGE775214          SEX255214          RHI125214          RHI225214 
                 0                  0                  0                  0 
         RHI325214          RHI425214          RHI525214          RHI625214 
                 0                  0                  0                  0 
         RHI725214          RHI825214          POP715213          POP645213 
                 0                  0                  0                  0 
         POP815213          EDU635213          EDU685213          VET605213 
                 0                  0                  0                  0 
         LFE305213          HSG010214          HSG445213          HSG096213 
                 0                  0                  0                  0 
         HSG495213          HSD410213          HSD310213          INC910213 
                 0                  0                  0                  0 
         INC110213          PVY020213          BZA010213          BZA110213 
                 0                  0                  0                  0 
         BZA115213          NES010213          SBO001207          SBO315207 
                 0                  0                  0                  0 
         SBO115207          SBO215207          SBO515207          SBO415207 
                 0                  0                  0                  0 
         SBO015207          MAN450207          WTN220207          RTN130207 
                 0                  0                  0                  0 
         RTN131207          AFN120207          BPS030214          LND110210 
                 0                  0                  0                  0 
         POP060210 
                 0 

There is no missing value in the PresElect2016R dataset. However, there are 51 missing values in the state_abbreviation column in the UScounty_facts dataset. Now we select these rows with missing values to see the detail.

# A tibble: 51 × 53
    fips area_name    state_abbreviation PST045214 PST120214 POP010210 AGE135214
   <dbl> <chr>        <chr>                  <dbl>     <dbl>     <dbl>     <dbl>
 1  1000 Alabama      <NA>                 4849377       1.4   4779736       6.1
 2  2000 Alaska       <NA>                  736732       3.7    710231       7.4
 3  4000 Arizona      <NA>                 6731484       5.3   6392017       6.4
 4  5000 Arkansas     <NA>                 2966369       1.7   2915918       6.5
 5  6000 California   <NA>                38802500       4.2  37253956       6.5
 6  8000 Colorado     <NA>                 5355866       6.5   5029196       6.3
 7  9000 Connecticut  <NA>                 3596677       0.6   3574097       5.3
 8 10000 Delaware     <NA>                  935614       4.2    897934       6  
 9 11000 District Of… <NA>                  658893       9.5    601723       6.5
10 12000 Florida      <NA>                19893297       5.8  18801310       5.5
# ℹ 41 more rows
# ℹ 46 more variables: AGE295214 <dbl>, AGE775214 <dbl>, SEX255214 <dbl>,
#   RHI125214 <dbl>, RHI225214 <dbl>, RHI325214 <dbl>, RHI425214 <dbl>,
#   RHI525214 <dbl>, RHI625214 <dbl>, RHI725214 <dbl>, RHI825214 <dbl>,
#   POP715213 <dbl>, POP645213 <dbl>, POP815213 <dbl>, EDU635213 <dbl>,
#   EDU685213 <dbl>, VET605213 <dbl>, LFE305213 <dbl>, HSG010214 <dbl>,
#   HSG445213 <dbl>, HSG096213 <dbl>, HSG495213 <dbl>, HSD410213 <dbl>, …

We see that these rows with missing value in the column state_abbreviation contain the data of demographic and socio-econimic indicators for each state. We will separate the data into two datasets: one for the states and the other for the counties.

Rows: 51
Columns: 51
$ state     <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "C…
$ PST045214 <dbl> 4849377, 736732, 6731484, 2966369, 38802500, 5355866, 359667…
$ PST120214 <dbl> 1.4, 3.7, 5.3, 1.7, 4.2, 6.5, 0.6, 4.2, 9.5, 5.8, 4.2, 4.4, …
$ POP010210 <dbl> 4779736, 710231, 6392017, 2915918, 37253956, 5029196, 357409…
$ AGE135214 <dbl> 6.1, 7.4, 6.4, 6.5, 6.5, 6.3, 5.3, 6.0, 6.5, 5.5, 6.6, 6.4, …
$ AGE295214 <dbl> 22.8, 25.3, 24.1, 23.8, 23.6, 23.3, 21.6, 21.8, 17.5, 20.4, …
$ AGE775214 <dbl> 15.3, 9.4, 15.9, 15.7, 12.9, 12.7, 15.5, 16.4, 11.3, 19.1, 1…
$ SEX255214 <dbl> 51.5, 47.4, 50.3, 50.9, 50.3, 49.8, 51.2, 51.6, 52.6, 51.1, …
$ RHI125214 <dbl> 69.7, 66.9, 83.7, 79.7, 73.2, 87.7, 81.2, 70.8, 43.6, 77.8, …
$ RHI225214 <dbl> 26.7, 3.9, 4.7, 15.6, 6.5, 4.5, 11.5, 22.2, 49.0, 16.8, 31.5…
$ RHI325214 <dbl> 0.7, 14.8, 5.3, 1.0, 1.7, 1.6, 0.5, 0.7, 0.6, 0.5, 0.5, 0.4,…
$ RHI425214 <dbl> 1.3, 6.1, 3.3, 1.5, 14.4, 3.1, 4.5, 3.8, 4.0, 2.8, 3.8, 37.5…
$ RHI525214 <dbl> 0.1, 1.3, 0.3, 0.3, 0.5, 0.2, 0.1, 0.1, 0.2, 0.1, 0.1, 10.0,…
$ RHI625214 <dbl> 1.5, 7.1, 2.7, 1.9, 3.7, 2.9, 2.2, 2.5, 2.6, 2.0, 2.0, 23.0,…
$ RHI725214 <dbl> 4.1, 6.8, 30.5, 7.0, 38.6, 21.2, 15.0, 8.9, 10.4, 24.1, 9.3,…
$ RHI825214 <dbl> 66.2, 61.9, 56.2, 73.4, 38.5, 69.0, 68.8, 63.7, 35.8, 55.8, …
$ POP715213 <dbl> 85.0, 80.3, 80.4, 83.6, 84.2, 80.7, 88.0, 86.3, 80.6, 83.7, …
$ POP645213 <dbl> 3.5, 7.0, 13.4, 4.5, 27.0, 9.7, 13.6, 8.4, 13.8, 19.4, 9.7, …
$ POP815213 <dbl> 5.2, 16.2, 26.8, 7.2, 43.7, 16.8, 21.5, 12.6, 15.8, 27.4, 13…
$ EDU635213 <dbl> 83.1, 91.6, 85.7, 83.7, 81.2, 90.2, 89.2, 87.7, 88.4, 86.1, …
$ EDU685213 <dbl> 22.6, 27.5, 26.9, 20.1, 30.7, 37.0, 36.5, 28.9, 52.4, 26.4, …
$ VET605213 <dbl> 388865, 71004, 522382, 237311, 1893539, 399458, 217947, 7508…
$ LFE305213 <dbl> 24.2, 18.8, 24.6, 21.3, 27.2, 24.5, 24.8, 24.8, 29.7, 25.9, …
$ HSG010214 <dbl> 2207912, 308583, 2909218, 1341033, 13900766, 2276184, 149356…
$ HSG445213 <dbl> 69.7, 63.8, 64.4, 66.7, 55.3, 65.4, 67.8, 72.5, 42.1, 67.1, …
$ HSG096213 <dbl> 15.9, 24.0, 20.7, 15.7, 31.0, 25.9, 34.4, 17.6, 62.3, 30.1, …
$ HSG495213 <dbl> 122500, 241800, 165100, 107300, 366400, 236200, 278900, 2358…
$ HSD410213 <dbl> 1838683, 251899, 2370289, 1129723, 12542460, 1977591, 135584…
$ HSD310213 <dbl> 2.55, 2.75, 2.67, 2.53, 2.94, 2.53, 2.55, 2.63, 2.20, 2.61, …
$ INC910213 <dbl> 23680, 32651, 25358, 22170, 29527, 31109, 37892, 29819, 4529…
$ INC110213 <dbl> 43253, 70760, 49774, 40768, 61094, 58433, 69461, 59878, 6583…
$ PVY020213 <dbl> 18.6, 9.9, 17.9, 19.2, 15.9, 13.2, 10.2, 11.7, 18.6, 16.3, 1…
$ BZA010213 <dbl> 97578, 20519, 132762, 64772, 874243, 154875, 88498, 24151, 2…
$ BZA110213 <dbl> 1603100, 266627, 2173205, 978094, 13401863, 2090975, 1473605…
$ BZA115213 <dbl> 1.1, 3.3, 1.8, 0.0, 3.5, 2.7, 0.7, 5.1, 1.7, 2.9, 2.0, 2.1, …
$ NES010213 <dbl> 311578, 52991, 420233, 191530, 2983996, 447586, 263511, 5686…
$ SBO001207 <dbl> 382350, 68728, 491529, 238994, 3425510, 547770, 332150, 7457…
$ SBO315207 <dbl> 14.8, 1.5, 2.0, 5.5, 4.0, 1.7, 4.4, 8.7, 28.2, 9.0, 20.4, 0.…
$ SBO115207 <dbl> 0.8, 10.0, 1.9, 1.1, 1.3, 0.8, 0.5, 0.0, 0.9, 0.5, 0.7, 1.3,…
$ SBO215207 <dbl> 1.8, 3.1, 3.3, 1.4, 14.9, 2.6, 3.3, 4.0, 5.9, 3.2, 5.1, 47.2…
$ SBO515207 <dbl> 0.1, 0.3, 0.0, 0.1, 0.3, 0.1, 0.0, 0.0, 0.0, 0.1, 0.1, 9.5, …
$ SBO415207 <dbl> 1.2, 0.0, 10.7, 2.3, 16.5, 6.2, 4.2, 2.1, 6.1, 22.4, 3.6, 3.…
$ SBO015207 <dbl> 28.1, 25.9, 28.1, 24.5, 30.3, 29.2, 28.1, 26.1, 34.5, 28.9, …
$ MAN450207 <dbl> 112858843, 8204030, 57977827, 60735582, 491372092, 46331953,…
$ WTN220207 <dbl> 52252752, 4563605, 57573459, 29659789, 598456486, 53598986, …
$ RTN130207 <dbl> 57344851, 9303387, 86758801, 32974282, 455032270, 65896788, …
$ RTN131207 <dbl> 12364, 13635, 13637, 11602, 12561, 13609, 14953, 16421, 6555…
$ AFN120207 <dbl> 6426342, 1851293, 13268514, 3559795, 80852787, 11440395, 913…
$ BPS030214 <dbl> 13369, 1518, 26997, 7666, 83645, 28686, 5329, 5194, 4189, 84…
$ LND110210 <dbl> 50645.33, 570640.95, 113594.08, 52035.48, 155779.22, 103641.…
$ POP060210 <dbl> 94.4, 1.2, 56.3, 56.0, 239.1, 48.5, 738.1, 460.8, 9856.5, 35…
Rows: 3,143
Columns: 53
$ fips               <dbl> 1001, 1003, 1005, 1007, 1009, 1011, 1013, 1015, 101…
$ county             <chr> "Autauga County", "Baldwin County", "Barbour County…
$ state_abbreviation <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL…
$ PST045214          <dbl> 55395, 200111, 26887, 22506, 57719, 10764, 20296, 1…
$ PST120214          <dbl> 1.5, 9.8, -2.1, -1.8, 0.7, -1.4, -3.1, -2.3, -0.3, …
$ POP010210          <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 1…
$ AGE135214          <dbl> 6.0, 5.6, 5.7, 5.3, 6.1, 6.3, 6.1, 5.7, 5.9, 4.8, 6…
$ AGE295214          <dbl> 25.2, 22.2, 21.2, 21.0, 23.6, 21.4, 23.6, 22.2, 21.…
$ AGE775214          <dbl> 13.8, 18.7, 16.5, 14.8, 17.0, 14.9, 18.0, 16.0, 18.…
$ SEX255214          <dbl> 51.4, 51.2, 46.6, 45.9, 50.5, 45.3, 53.6, 51.8, 52.…
$ RHI125214          <dbl> 77.9, 87.1, 50.2, 76.3, 96.0, 26.9, 53.9, 75.8, 58.…
$ RHI225214          <dbl> 18.7, 9.6, 47.6, 22.1, 1.8, 70.1, 44.0, 21.1, 39.5,…
$ RHI325214          <dbl> 0.5, 0.7, 0.6, 0.4, 0.6, 0.8, 0.4, 0.5, 0.3, 0.5, 0…
$ RHI425214          <dbl> 1.1, 0.9, 0.5, 0.2, 0.3, 0.3, 0.9, 0.9, 0.8, 0.3, 0…
$ RHI525214          <dbl> 0.1, 0.1, 0.2, 0.1, 0.1, 0.7, 0.0, 0.1, 0.1, 0.0, 0…
$ RHI625214          <dbl> 1.8, 1.6, 0.9, 0.9, 1.2, 1.1, 0.8, 1.7, 1.1, 1.6, 1…
$ RHI725214          <dbl> 2.7, 4.6, 4.5, 2.1, 8.7, 7.5, 1.2, 3.5, 2.0, 1.5, 7…
$ RHI825214          <dbl> 75.6, 83.0, 46.6, 74.5, 87.8, 22.1, 53.1, 72.9, 56.…
$ POP715213          <dbl> 85.0, 82.1, 84.8, 86.6, 88.7, 84.7, 94.6, 83.6, 85.…
$ POP645213          <dbl> 1.6, 3.6, 2.9, 1.2, 4.3, 5.4, 0.8, 2.4, 1.1, 0.7, 5…
$ POP815213          <dbl> 3.5, 5.5, 5.0, 2.1, 7.3, 5.2, 1.7, 4.5, 1.3, 1.1, 7…
$ EDU635213          <dbl> 85.6, 89.1, 73.7, 77.5, 77.0, 67.8, 76.3, 78.6, 75.…
$ EDU685213          <dbl> 20.9, 27.7, 13.4, 12.1, 12.1, 12.5, 14.0, 16.1, 11.…
$ VET605213          <dbl> 5922, 19346, 2120, 1327, 4540, 636, 1497, 11385, 26…
$ LFE305213          <dbl> 26.2, 25.9, 24.6, 27.6, 33.9, 26.9, 24.0, 22.5, 24.…
$ HSG010214          <dbl> 22751, 107374, 11799, 8978, 23826, 4461, 9916, 5328…
$ HSG445213          <dbl> 76.8, 72.6, 67.7, 79.0, 81.0, 74.3, 70.3, 68.7, 67.…
$ HSG096213          <dbl> 8.3, 24.4, 10.6, 7.3, 4.5, 8.7, 13.3, 13.8, 11.1, 4…
$ HSG495213          <dbl> 136200, 168600, 89200, 90500, 117100, 70600, 74700,…
$ HSD410213          <dbl> 20071, 73283, 9200, 7091, 21108, 3741, 8235, 45196,…
$ HSD310213          <dbl> 2.71, 2.52, 2.66, 3.03, 2.70, 2.73, 2.47, 2.54, 2.4…
$ INC910213          <dbl> 24571, 26766, 16829, 17427, 20730, 18628, 17403, 20…
$ INC110213          <dbl> 53682, 50221, 32911, 36447, 44145, 32033, 29918, 39…
$ PVY020213          <dbl> 12.1, 13.9, 26.7, 18.1, 15.8, 21.6, 28.4, 21.9, 24.…
$ BZA010213          <dbl> 817, 4871, 464, 275, 660, 112, 393, 2311, 515, 379,…
$ BZA110213          <dbl> 10120, 54988, 6611, 3145, 6798, 0, 5711, 34871, 643…
$ BZA115213          <dbl> 2.1, 3.7, -5.6, 7.5, 3.4, 0.0, 2.7, 0.6, -0.2, 5.5,…
$ NES010213          <dbl> 2947, 16508, 1546, 1126, 3563, 470, 1095, 6352, 235…
$ SBO001207          <dbl> 4067, 19035, 1667, 1385, 4458, 417, 1769, 8713, 198…
$ SBO315207          <dbl> 15.2, 2.7, 0.0, 14.9, 0.0, 0.0, 0.0, 7.2, 0.0, 0.0,…
$ SBO115207          <dbl> 0.0, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
$ SBO215207          <dbl> 1.3, 1.0, 0.0, 0.0, 0.0, 0.0, 3.3, 1.6, 0.0, 0.0, 0…
$ SBO515207          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ SBO415207          <dbl> 0.7, 1.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0, 0…
$ SBO015207          <dbl> 31.7, 27.3, 27.0, 0.0, 23.2, 38.8, 0.0, 24.7, 29.3,…
$ MAN450207          <dbl> 0, 1410273, 0, 0, 341544, 0, 399132, 2679991, 66728…
$ WTN220207          <dbl> 0, 0, 0, 0, 0, 0, 56712, 0, 0, 62293, 155139, 52904…
$ RTN130207          <dbl> 598175, 2966489, 188337, 124707, 319700, 43810, 229…
$ RTN131207          <dbl> 12003, 17166, 6334, 5804, 5622, 3995, 11326, 13678,…
$ AFN120207          <dbl> 88157, 436955, 0, 10757, 20941, 3670, 28427, 186533…
$ BPS030214          <dbl> 131, 1384, 8, 19, 3, 1, 2, 114, 8, 2, 78, 0, 11, 0,…
$ LND110210          <dbl> 594.44, 1589.78, 884.88, 622.58, 644.78, 622.81, 77…
$ POP060210          <dbl> 91.8, 114.6, 31.0, 36.8, 88.9, 17.5, 27.0, 195.7, 5…

Next, we need to check the dimensions of the datasets.

[1] 3141    9
[1] 3143   53

We can see that these two datasets do not have the same number of rows. We need to find out if there exist counties listed in one file may or may not appear in the other.

# A tibble: 1 × 3
   FIPS county      state   
  <dbl> <chr>       <chr>   
1 36000 Kansas City Missouri
# A tibble: 3 × 3
   fips county           state_abbreviation
  <dbl> <chr>            <chr>             
1 15005 Kalawao County   HI                
2 31103 Keya Paha County NE                
3 51515 Bedford city     VA                

2.1.2 Statistical Summary

# A tibble: 50 × 61
# Groups:   state [50]
   state      state.po totalvotes votesR votesD  fracR fracD frac_diff partywonR
   <chr>      <chr>         <dbl>  <dbl>  <dbl>  <dbl> <dbl>     <dbl>     <dbl>
 1 Alaska     AK           279524 1.63e5 1.16e5 0.584  0.416    0.169          1
 2 Arizona    AZ          2413568 1.25e6 1.16e6 0.519  0.481    0.0378         1
 3 Arkansas   AR          1065366 6.85e5 3.80e5 0.643  0.357    0.286          1
 4 California CA         13237598 4.48e6 8.75e6 0.339  0.661    0.323          0
 5 Colorado   CO          2541354 1.20e6 1.34e6 0.473  0.527    0.0537         0
 6 Connectic… CT          1570787 6.73e5 8.98e5 0.429  0.571    0.143          0
 7 Delaware   DE           420730 1.85e5 2.36e5 0.440  0.560    0.120          0
 8 District … DC           295553 1.27e4 2.83e5 0.0430 0.957    0.914          0
 9 Florida    FL          9122861 4.62e6 4.50e6 0.506  0.494    0.0124         1
10 Georgia    GA          3967067 2.09e6 1.88e6 0.527  0.473    0.0532         1
# ℹ 40 more rows
# ℹ 52 more variables: winning_party <chr>, electoral_votes <dbl>,
#   PST045214 <dbl>, PST120214 <dbl>, POP010210 <dbl>, AGE135214 <dbl>,
#   AGE295214 <dbl>, AGE775214 <dbl>, SEX255214 <dbl>, RHI125214 <dbl>,
#   RHI225214 <dbl>, RHI325214 <dbl>, RHI425214 <dbl>, RHI525214 <dbl>,
#   RHI625214 <dbl>, RHI725214 <dbl>, RHI825214 <dbl>, POP715213 <dbl>,
#   POP645213 <dbl>, POP815213 <dbl>, EDU635213 <dbl>, EDU685213 <dbl>, …
Summary of 2016 Presidential Election Results
Category Votes Received Votes Share Total
Republican Democratic Republican % Democratic %
Total votes received 62,977,826 65,840,274 48.89 51.11 128,818,100
Number of counties won 2,634 507 83.86 16.14 3,141
Number of states won 30 21 58.82 41.18 51
Electoral votes 306 232 56.88 43.12 538

There are in total of 238,828,200 registered votes, with turnout rate of 54.8%, meaning 54.8% of people age 18 or above voted in this election. The Democratic party received almost 3 million more votes than the Republican party, with 65,840,274 votes (51.11%) compared to 62,977,826 votes (48.89%). Despite this, the Republican party won a significantly higher number of counties, with 2,634 counties (83.86%) favoring them, while only 507 counties (16.14%) favored the Democratic party. In terms of state victories, the Republicans secured 30 states (58.82%), while the Democrats won 21 states (41.18%). This resulted in the Republican party gaining more electoral votes, with 306 (56.69%) compared to the Democratic party’s 232 electoral votes (43.31%), ultimately leading to their victory in the 2016 Presidential Election.

2016 Presidential Election Voting Results by State
State Votes Received Vote Share Winner
Republican Democratic Republican % Democratic %
Alabama 1,318,250 729,547 64.37 35.63 Republican
Alaska 163,343 116,181 58.44 41.56 Republican
Arizona 1,252,401 1,161,167 51.89 48.11 Republican
Arkansas 684,872 380,494 64.29 35.71 Republican
California 4,483,810 8,753,788 33.87 66.13 Democratic
Colorado 1,202,484 1,338,870 47.32 52.68 Democratic
Connecticut 673,215 897,572 42.86 57.14 Democratic
Delaware 185,127 235,603 44.00 56.00 Democratic
District of Columbia 12,723 282,830 4.30 95.70 Democratic
Florida 4,617,886 4,504,975 50.62 49.38 Republican
Georgia 2,089,104 1,877,963 52.66 47.34 Republican
Hawaii 128,847 266,891 32.56 67.44 Democratic
Idaho 409,055 189,765 68.31 31.69 Republican
Illinois 2,146,015 3,090,729 40.98 59.02 Democratic
Indiana 1,557,286 1,033,126 60.12 39.88 Republican
Iowa 800,983 653,669 55.06 44.94 Republican
Kansas 671,018 427,005 61.11 38.89 Republican
Kentucky 1,202,971 628,854 65.67 34.33 Republican
Louisiana 1,178,638 780,154 60.17 39.83 Republican
Maine 334,945 354,718 48.57 51.43 Democratic
Maryland 943,169 1,677,928 35.98 64.02 Democratic
Massachusetts 1,090,893 1,995,196 35.35 64.65 Democratic
Michigan 2,279,543 2,268,839 50.12 49.88 Republican
Minnesota 1,322,951 1,367,716 49.17 50.83 Democratic
Mississippi 700,714 485,131 59.09 40.91 Republican
Missouri 1,594,511 1,071,068 59.82 40.18 Republican
Montana 279,240 177,709 61.11 38.89 Republican
Nebraska 495,501 284,454 63.53 36.47 Republican
Nevada 512,058 539,260 48.71 51.29 Democratic
New Hampshire 345,790 348,526 49.80 50.20 Democratic
New Jersey 1,601,933 2,148,278 42.72 57.28 Democratic
New Mexico 319,667 385,234 45.35 54.65 Democratic
New York 2,814,589 4,547,562 38.23 61.77 Democratic
North Carolina 2,362,631 2,189,316 51.90 48.10 Republican
North Dakota 216,794 93,758 69.81 30.19 Republican
Ohio 2,841,005 2,394,164 54.27 45.73 Republican
Oklahoma 949,136 420,375 69.30 30.70 Republican
Oregon 782,403 1,002,106 43.84 56.16 Democratic
Pennsylvania 2,970,733 2,926,441 50.38 49.62 Republican
Rhode Island 180,490 251,888 41.74 58.26 Democratic
South Carolina 1,155,389 855,373 57.46 42.54 Republican
South Dakota 227,721 117,458 65.97 34.03 Republican
Tennessee 1,522,925 870,695 63.62 36.38 Republican
Texas 4,685,047 3,877,868 54.71 45.29 Republican
Utah 515,231 310,676 62.38 37.62 Republican
Vermont 95,369 178,573 34.81 65.19 Democratic
Virginia 1,769,443 1,981,473 47.17 52.83 Democratic
Washington 1,221,747 1,742,718 41.21 58.79 Democratic
West Virginia 489,371 188,794 72.16 27.84 Republican
Wisconsin 1,404,440 1,381,823 50.41 49.59 Republican
Wyoming 174,419 55,973 75.71 24.29 Republican

First, let’s see what counties has the highest Republican vote share and Democratic vote share.

Top 5 Counties with Highest Republican Vote Share
County State Republican Vote Share
Roberts Texas 0.946
King Texas 0.937
Motley Texas 0.920
Hayes Nebraska 0.918
Shackelford Texas 0.916

4 out of the 5 counties with highest Republican vote share are in Texas.

Top 5 Counties with Highest Democratic Vote Share
County State Democratic Vote Share
District of Columbia District of Columbia 0.909
Bronx New York 0.885
Prince George's Maryland 0.881
Petersburg Virginia 0.872
Claiborne Mississippi 0.868
Average Socio-Economic Indicators of Counties by Favouring Party
Favouring Party Median House Value Income per capita Bachelor's Degree Pct White Population Pct Population Density Women Pct
Democratic 198,993.44 25,872.56 33.36 50.18 18.82 51.06
Republican 118,008.53 23,187.74 23.58 76.35 6.12 50.45
[1] 0
[1] Republican Democratic
Levels: Republican Democratic
             state           state.po         totalvotes             votesR 
                 0                  0                  0                  0 
            votesD              fracR              fracD          frac_diff 
                 0                  0                  0                  0 
         partywonR      winning_party    electoral_votes          PST045214 
                 0                  0                  0                  0 
         PST120214          POP010210          AGE135214          AGE295214 
                 0                  0                  0                  0 
         AGE775214          SEX255214          RHI125214          RHI225214 
                 0                  0                  0                  0 
         RHI325214          RHI425214          RHI525214          RHI625214 
                 0                  0                  0                  0 
         RHI725214          RHI825214          POP715213          POP645213 
                 0                  0                  0                  0 
         POP815213          EDU635213          EDU685213          VET605213 
                 0                  0                  0                  0 
         LFE305213          HSG010214          HSG445213          HSG096213 
                 0                  0                  0                  0 
         HSG495213          HSD410213          HSD310213          INC910213 
                 0                  0                  0                  0 
         INC110213          PVY020213          BZA010213          BZA110213 
                 0                  0                  0                  0 
         BZA115213          NES010213          SBO001207          SBO315207 
                 0                  0                  0                  0 
         SBO115207          SBO215207          SBO515207          SBO415207 
                 0                  0                  0                  0 
         SBO015207          MAN450207          WTN220207          RTN130207 
                 0                  0                  0                  0 
         RTN131207          AFN120207          BPS030214          LND110210 
                 0                  0                  0                  0 
         POP060210 EDU685213_weighted RHI825214_weighted POP645213_weighted 
                 0                  0                  0                  0 
SEX255214_weighted 
                 0 
Average Socio-Economic Indicators of States by Favouring Party
Favouring Party Median House Value Income per capita Bachelor's Degree Pct White Population Pct Population Density Women Pct
Republican 194,876.47 28,053.80 28.83 62.05 12.99 50.78
Democratic NaN NaN NaN NaN NaN NaN
Average Socio-Economic Indicators of Counties by Favouring Party
winning_party HSG495213 INC910213 INC110213 MAN450207 WTN220207 RTN130207 RTN131207 AFN120207 BPS030214 PST120214 VET605213 LFE305213 HSG010214 HSG445213 HSD410213 HSD310213 BZA010213 BZA110213 BZA115213 NES010213 SBO001207 LND110210 POP060210 AGE135214 AGE295214 AGE775214 SEX255214 RHI125214 RHI225214 RHI325214 RHI425214 RHI525214 RHI625214 RHI725214 RHI825214 POP715213 POP645213 POP815213 EDU635213 EDU685213 HSG096213 PVY020213 SBO315207 SBO115207 SBO215207 SBO515207 SBO415207 SBO015207 total_population
Republican 194,876.47 28,053.80 53,530.27 104,672,676.49 81,848,755.22 76,816,930.51 13,387.69 12,035,210.43 20,516.94 3.10 416,936.84 23.75 2,626,611.37 66.43 2,266,866.98 2.56 146,830.45 2,318,946.14 1.69 451,090.59 534,481.65 69,253.05 384.40 6.25 23.08 14.51 50.78 77.36 13.22 1.25 5.43 0.23 2.51 17.37 62.05 84.86 12.99 20.80 85.95 28.83 26.19 15.36 6.95 0.87 5.59 0.14 8.04 28.55 318,857,056.00
Democratic NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.00
Average Socio-Economic Indicators of Counties by Favouring Party
winning_party HSG495213 INC910213 INC110213 MAN450207 WTN220207 RTN130207 RTN131207 AFN120207 BPS030214 PST120214 VET605213 LFE305213 HSG010214 HSG445213 HSD410213 HSD310213 BZA010213 BZA110213 BZA115213 NES010213 SBO001207 LND110210 POP060210 AGE135214 AGE295214 AGE775214 SEX255214 RHI125214 RHI225214 RHI325214 RHI425214 RHI525214 RHI625214 RHI725214 RHI825214 POP715213 POP645213 POP815213 EDU635213 EDU685213 HSG096213 PVY020213 SBO315207 SBO115207 SBO215207 SBO515207 SBO415207 SBO015207 total_population
Democratic 198,993.44 25,872.56 49,758.27 4,715,100.42 5,481,569.52 4,420,280.44 12,331.48 799,118.33 1,106.17 1.66 19,295.33 23.25 139,508.72 64.37 123,092.09 2.63 8,543.14 140,287.74 0.96 26,892.84 31,355.84 1,840.07 1,111.98 6.38 22.97 13.22 51.06 70.48 16.96 1.15 8.25 0.32 2.84 23.10 50.18 84.47 18.82 29.28 85.50 33.36 34.70 15.73 9.23 0.68 7.97 0.16 10.95 29.57 174,148,371.00
Republican 118,008.53 23,187.74 45,210.49 792,693.60 397,561.91 641,172.78 9,840.60 77,485.80 185.56 0.24 4,382.83 22.93 24,187.95 73.66 20,360.90 2.51 1,203.26 16,908.64 0.62 3,594.23 4,484.11 988.08 96.63 6.07 23.21 16.04 50.45 85.64 8.73 1.35 2.06 0.13 2.10 10.48 76.35 85.29 6.12 10.87 86.46 23.58 16.41 14.93 3.00 0.51 1.56 0.01 3.09 23.72 144,707,786.00

2.1.3 Visualisation

2.1.3.1 Win margin for state

Figure 1

2.1.3.2 winning party map

Figure 2

There’s seems to be correlation between geographical location and voting pattern. For example, the central states tend to vote Republican, while the coastal states and some northern states tend to vote Democratic.

Divide states into regions and plot the voting results by region.

                  state    region           division
1           Connecticut Northeast        New England
2                 Maine Northeast        New England
3         Massachusetts Northeast        New England
4         New Hampshire Northeast        New England
5          Rhode Island Northeast        New England
6               Vermont Northeast        New England
7            New Jersey Northeast    Middle Atlantic
8              New York Northeast    Middle Atlantic
9          Pennsylvania Northeast    Middle Atlantic
10              Indiana   Midwest East North Central
11             Illinois   Midwest East North Central
12             Michigan   Midwest East North Central
13                 Ohio   Midwest East North Central
14            Wisconsin   Midwest East North Central
15                 Iowa   Midwest West North Central
16               Kansas   Midwest West North Central
17            Minnesota   Midwest West North Central
18             Missouri   Midwest West North Central
19             Nebraska   Midwest West North Central
20         North Dakota   Midwest West North Central
21         South Dakota   Midwest West North Central
22             Delaware     South     South Atlantic
23 District of Columbia     South     South Atlantic
24              Florida     South     South Atlantic
25              Georgia     South     South Atlantic
26             Maryland     South     South Atlantic
27       North Carolina     South     South Atlantic
28       South Carolina     South     South Atlantic
29             Virginia     South     South Atlantic
30        West Virginia     South     South Atlantic
31              Alabama     South East South Central
32             Kentucky     South East South Central
33          Mississippi     South East South Central
34            Tennessee     South East South Central
35             Arkansas     South West South Central
36            Louisiana     South West South Central
37             Oklahoma     South West South Central
38                Texas     South West South Central
39              Arizona      West           Mountain
40             Colorado      West           Mountain
41                Idaho      West           Mountain
42              Montana      West           Mountain
43               Nevada      West           Mountain
44           New Mexico      West           Mountain
45                 Utah      West           Mountain
46              Wyoming      West           Mountain
47               Alaska      West            Pacific
48           California      West            Pacific
49               Hawaii      West            Pacific
50               Oregon      West            Pacific
51           Washington      West            Pacific
Average Vote Share by Region
Region Republican Vote Share Democratic Vote Share
Midwest 0.52 0.48
Northeast 0.43 0.57
South 0.54 0.46
West 0.42 0.58
Average Vote Share by Division
Division Republican Vote Share Democratic Vote Share
East North Central 0.50 0.50
East South Central 0.64 0.36
Middle Atlantic 0.43 0.57
Mountain 0.53 0.47
New England 0.40 0.60
Pacific 0.36 0.64
South Atlantic 0.50 0.50
West North Central 0.57 0.43
West South Central 0.58 0.42

Midwest region tends to vote Republican, while the Northeast regions tend to vote Democratic.

Now let’s further divide states into divisions and plot the voting results by division.

There does seem to be correlation between geographical location and voting pattern. For example, East North Central, East South Central, and West South Central regions tend to vote Republican, while the Pacific and New England regions tend to vote Democratic. However, there still exists divions where the voting pattern is not as clear-cut, such as South Atlantic and Mountain regions.

This suggests that adding a new variable for region or division could be useful in predicting voting outcomes based on socio-economic indicators.

Further into county level, we will plot the voting results by county.

# A tibble: 4 × 12
  state    state.po county fips  votesR votesD totalvotes  fracR partywonR fracD
  <chr>    <chr>    <chr>  <chr>  <dbl>  <dbl>      <dbl>  <dbl>     <dbl> <dbl>
1 Alaska   AK       Valde… 02261   2618   1226       3844 0.681          1 0.319
2 Alaska   AK       Wade … 02270    415   1180       1595 0.260          0 0.739
3 Missouri MO       Kansa… 36000  24654  97735     122389 0.192          0 0.760
4 South D… SD       Oglal… 46113    241   2510       2751 0.0830         0 0.864
# ℹ 2 more variables: frac_diff <dbl>, winning_party <chr>
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -1921992 ymin: -2158795 xmax: 60832.6 ymax: -141943.4
Projected CRS: NAD27 / US National Atlas Equal Area
# A tibble: 6 × 5
  fips  abbr  full         county                                           geom
* <chr> <chr> <chr>        <chr>                              <MULTIPOLYGON [m]>
1 02063 AK    Alaska       Chugach Census Area      (((-1476669 -2101298, -1469…
2 02066 AK    Alaska       Copper River Census Area (((-1457015 -2063407, -1443…
3 02158 AK    Alaska       Kusilvak Census Area     (((-1921992 -2073996, -1919…
4 15005 HI    Hawaii       Kalawao County           (((-529505.2 -1982971, -526…
5 31103 NE    Nebraska     Keya Paha County         (((-16148.74 -222421.3, -16…
6 46102 SD    South Dakota Oglala Lakota County     (((-242200.2 -150471.8, -23…

After looking up information on the Alaska government website, we found that in 2019, the Valdez-Cordova county was ablished and replaced by the Chugach Census Area and the Copper River Census Area. Also, in 2015, the Wade Hampton Census Area was renamed to the Kusilvak Census Area, and the fips code was changed from 02270 to 02158. We will deal with this later.

Histograms of all numerical variables at the county level

From the histograms, we can see that most of the predictors are right-skewed, indicating that the majority of counties have lower values for these indicators. This is expected as socio-economic indicators such as median household income, educational attainment, and racial demographics tend to vary significantly across counties.

This also suggests that data transformation methods such as logarithm transformation may be necessary to address the skewness in the variables.

Looking at the response variable fracR, we can see that the distribution is not symmetric, with a peak around 0.74. This indicates that the Republican vote share is higher in most counties.

Compared to histograms for county level, there is reduced but still significant skewness for most indicators at the state level.

[[1]]


[[2]]


[[3]]


[[4]]


[[5]]

We noticed that Kansas City, Missouri is missing in the county_facts dataset. We will remove this row from the merged dataset.

Now, let’s look at the averaged socio-economic indicators at the county level for each party. We will first start with population:

Demographic:

Educational Attainment

Housing

housing-2

Employment-1

Employment-2

Sales: